home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
C/C++ Interactive Reference Guide
/
C-C++ Interactive Reference Guide.iso
/
c_ref
/
csource5
/
355_01
/
slk1.exe
/
SPPNOTES.DOC
< prev
Wrap
Text File
|
1991-06-11
|
5KB
|
107 lines
HOW SPP COULD BE IMPROVED
If I were to do SPP all over again I would make several significant changes.
Although I doubt that anyone would want to revise SPP, the following
comments would also apply to speeding up a C compiler, so they may be of
more general interest.
What follows are terse hints about how to improve SPPUs performance and
coding style. I would be glad to discuss any of these ideas at greater length
with you. Just give me a call.
1. I would make SPP token based, rather than character based. SPP
spends a lot of time scanning identifiers and copying characters from one
buffer to another. (I've measured.) Tokens would be created for identifiers,
strings, constants and other entities very early on.
2. I would make SPP a multiple pass compiler. This would significantly
*increase* its speed because only the first two passes would deal with
characters. All other passes would deal with tokens. The pass structure
would closely follow the translation phases described in the ANSI C
Standard.
Pass one would be a state machine which read the entire file into memory
and then gather statistics about the file. These statistics would be used to
allocate memory for all strings, identifiers and tokens. This is an important
point: although this initial pass would seem to be pure overhead, it would
*greatly* simplify later storage allocation. (No more testing to see if we
have to allocate more memory.) Storage for all tokens, symbol tables and
strings (including the spellings of identifiers) would then be allocated
*all at once.*
Pass two would also be a state machine, similar to the first pass. Pass two
would simply "fill in" the already allocated storage with all the tokens and
strings. Symbol tables would be created in a later pass. Later passes would
handle parsing and output.
By the way, I am not sure that the initial statistics gathering pass would
really be a good idea, and it is worth looking into. If if doesn't turn out it
could be abandoned without giving up tokens.
3. I would distinguish between an id's spelling table and its symbol table.
The spelling table would be created when the id is first tokenized. It would
contain information about whether the id could be a reserved word, is
currently defined as a macro, etc. The symbol table would be used during
parsing, and would contain a pointer to the spelling table.
The name table would have the following format:
attribute byte: reserved-word and macro-expansion bits and
possibly other bits.
length byte: the length of the identifier.
the spelling of the identifier itself, \0 terminated.
4. I would have SPP look up an id only once, when it is first tokenized.
All further references to an identifier would be via a pointer to the idUs
spelling table entry which would be part of the idUs token. This will save
*lots* of time.
5. I would rewrite the semantic routines. (Notice all the bug fixes in
sem.c) I tried to do all the semantic actions on the fly which turned out to
be a big mistake. It would be much better, and probably *faster*, to use
three passes for doing semantic analysis. The first pass would build a full
parse tree for declarations, the second pass would do flow analysis, and a
third pass would output the program including the Sherlock macros.
Having a tokenized parse tree helps a lot, since each pass can move back
and forth in the tree as required.
6. I would scrap the file copier logic in sysnext. Although this logic
works, it is *very* slow since it is used on every input character. SPP
should produce its output in a final pass after all tokenizing and parsing is
complete. That way very little scanning will need to be done.
To summarize these first 6 points, I would use tokens and multiple passes
instead of a single-pass character-oriented approach. The remaining points
deal with programming style.
7. I would use stream routines in the macro expansion logic instead of a
profusion of buffers. (See the code in def.c) The stream routines would
hide the details of testing for full buffers and allocating new buffers.
Although the details of macro expansion and token pasting are inherently
complex, using streams would have significantly simplified matters.
(Dealing with tokens instead of characters would also remove some
complications from the macro expansion code.)
8. I would use many more header files. This is made possible and
desirable by several new features in the new ANSI C Standard. First, the
Standard defines many new headers, which I would naturally incorporate
into the program. Second, having function prototypes is a godsend, so I
would define a header file for just about every .c file. The Macintosh
version of SPP uses this new style and it has been quite an improvement.
I have found two headers to be particularly useful: env.h and ver.h. env.h
contains #defines that describe the current environment. So any machine-
dependent code is enclosed in
#ifdef ENV_XXX
code
#endif
The ver.h header contains strings describing the current version of the code.
Although my opinions are based on statistics and a great deal of reflection I
am not attached to them. If you have any questions or comments or other
suggestions I would be happy to hear from you.